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Access data query 
and extraction 
computer using a web 
browser 
200 






Select one or more 
descriptions of data of 
interest within a 
category 
202 






Provide values for one 
or more extraction 
parameters 
204 






Extract data using 
values for extraction 

parameters and 
descriptions of data of 
interest 
206 






Collect and present 
extracted results on a 
web page 
208 
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^ Start ^ 



Identify a web site 
having data of interest 
300 



Does the data of interest fit 
into an existing category? 
302 



Identify extraction 
parameters, e.g. 
inputs 
304 



Identify portions of 
data of interest, e.g. 
outputs 
306 



Add a new description 
of data of interest 
inside selected 
category 
308 
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^ Start ^ 



Has all of the data of 
interested been extracted? 
400 



Yes 



Identify a portion of the data of 
interest on the web site using sample 
extraction parameters to retrieve a 
sample page 
402 



Generate an extractor pattern to 
match the portion of data of interest 
on the sample page 
404 



Select an instruction from a 
predefined list of instructions. 
406 



Use the extractor pattern in conjunction 
with the selected instruction to extract a 
portion of data of interest. Assign the 
extracted data to one or more ouputs, 
or store the extracted data for 
subsequent use. 



Improve the instruction by adding 
optional features including pruning, 
dissection, and 
branching/looping 
410 



Sequence the instruction among other 
instructions for this description of data 
of interest 
412 



Test sequenced instructions on 
various extraction parameters 
414 
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Web Page 



Not of Interest 



Top selections 

1 

2 

3 



Not of Interest 
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Unit 1 



Not of Interest 



Not of Interest 



Unitn 



Not of Interest 



More Records link,.. 



Not of Interest 



^500 
-502 



Many web sites display the same, or similar, 
' information twice. 504 



Portion of Data of Interest 1 



Not of Interest 



Link - Follow to Portion of 
Data of Interest 2 



Not of Interest 




506B 



By detecting a link indicating that there is additional data 
conforming to the extraction parameters, additional web 
pages can be retrieved to accumulate additional units 



for extraction. 508 
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1100 
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2300 
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2500 
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Philip Ueil Mass Market Paperback / Published 1 990 
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